add Support Java Class with circular references #37738

1zg12 · 2022-08-31T08:12:11Z

What changes were proposed in this pull request?

If the target Java data class has a circular reference, Spark will fail fast from creating the Dataset or running Encoders.

This PR will add an option for developer to decide whether they would like to skip the circular field, or leave the application to fail.

Why are the changes needed?

If the target Java data class has a circular reference, Spark will fail fast from creating the Dataset or running Encoders.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Issue: https://issues.apache.org/jira/browse/SPARK-33598

AmplabJenkins · 2022-08-31T15:59:31Z

Can one of the admins verify this patch?

srowen · 2022-08-31T16:51:33Z

Hm, skipping them doesn't seem right either. Not sure if this should be an option; it is just something that doesn't make sense to encode

1zg12 · 2022-09-01T03:35:46Z

Hm, skipping them doesn't seem right either. Not sure if this should be an option; it is just something that doesn't make sense to encode

If it's a field the developer/application is comfortable with having self/circular reference, from Spark perspective I think it should allow the developer to stop the loop gracefully (which is to skip further processing the field in loop at developers' own judgement).

This PR is not to force for either way (stop the whole application immediately as existing or skip the filed if the developer choose to), but leave for the developer a choice to choose. Ultimately, it's the developer building their own application have best knowledge how to handle it.

I guess Spark probably assumed the circular reference must be a mistake made by the developers/application earlier. But it can really be a valid case even it could be rare.

srowen · 2022-09-01T03:39:54Z

Can you describe a valid use case? I can't think of one. Encoders are used with data classes, bean-like classes

1zg12 · 2022-09-01T04:14:48Z

Can you describe a valid use case? I can't think of one. Encoders are used with data classes, bean-like classes

Google Protobuf is an example, it's widely used as a data class. In the protobuf class, there is an attribute called Descriptor (this is a generated filed) which circular reference back.

Current spark implementation doesn't work with protobuf.

There are some other examples on the issue:
https://issues.apache.org/jira/browse/SPARK-33598

jkhalid · 2022-11-28T23:16:31Z

Hey All ,
Is there any update on this PR. I am currently running into an issue where I have a JSON schema that has circular references I am using Encoders.bean(classOf[Example.class]) to map the input to the POJO generated from that schema .it fails because of the circular reference

venkyvb · 2022-12-20T07:08:19Z

Hey all,
Wondering if this PR (or some similar fix got merged). I have similar issues with circular references and it would be great to have an option to skip the check.
PS: My example is not related to protobuf but is related to the POJO classes generated from the XSD schemas for FHIR - https://hl7.org/fhir/downloads.html
Thanks.

lsgrep · 2023-03-17T08:46:12Z

Hi, I am having this circular reference problem while processing the Kafka avro messages with Spark 3.3.0.

Exception in thread "main" java.lang.UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class org.apache.avro.Schema

https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java

lsgrep · 2023-03-17T08:47:21Z

Hi @srowen , would you consider supporting avro schemas as a valid reason for supporting this feature as avro is pretty popular in general? Thanks

srowen · 2023-03-17T10:49:23Z

Still seems weird to me --
Does this happen to even be 'enough' for the protobuf case? Or does this extra unwanted descriptor field add other unneeded cols?
Is it 'too much' - Is it skipping real circular references that matter, but can't translate to tabular schemas?
Is it solvable by just subclassing the bean class and hiding the field that isn't desirable to begin with?
How does the circular ref arise in the avro case, different?

I get it just seems a bit too hacky as the 'right' solution. Sometimes hacks are worth it

github-actions bot added the SQL label Aug 31, 2022

1zg12 force-pushed the master branch 6 times, most recently from 428d510 to 8eb5b06 Compare August 31, 2022 11:38

add option to skip circular ref

9b60db1

1zg12 force-pushed the master branch from 8eb5b06 to 9b60db1 Compare September 1, 2022 03:23

github-actions bot added the ML label Sep 1, 2022

rangadi mentioned this pull request Oct 20, 2022

Protobuf generate V2 and V3 protos and extend tests. #38324

Closed

srowen closed this Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Support Java Class with circular references #37738

add Support Java Class with circular references #37738

Uh oh!

1zg12 commented Aug 31, 2022

Uh oh!

AmplabJenkins commented Aug 31, 2022

Uh oh!

srowen commented Aug 31, 2022

Uh oh!

1zg12 commented Sep 1, 2022

Uh oh!

srowen commented Sep 1, 2022

Uh oh!

1zg12 commented Sep 1, 2022

Uh oh!

jkhalid commented Nov 28, 2022

Uh oh!

venkyvb commented Dec 20, 2022 •

edited

Loading

Uh oh!

lsgrep commented Mar 17, 2023

Uh oh!

lsgrep commented Mar 17, 2023

Uh oh!

srowen commented Mar 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

add Support Java Class with circular references #37738

add Support Java Class with circular references #37738

Uh oh!

Conversation

1zg12 commented Aug 31, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Aug 31, 2022

Uh oh!

srowen commented Aug 31, 2022

Uh oh!

1zg12 commented Sep 1, 2022

Uh oh!

srowen commented Sep 1, 2022

Uh oh!

1zg12 commented Sep 1, 2022

Uh oh!

jkhalid commented Nov 28, 2022

Uh oh!

venkyvb commented Dec 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lsgrep commented Mar 17, 2023

Uh oh!

lsgrep commented Mar 17, 2023

Uh oh!

srowen commented Mar 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

venkyvb commented Dec 20, 2022 •

edited

Loading